GéDériF: Automatic Generation and Analysis of Morphologically Constructed Lexical Resources
نویسندگان
چکیده
One of the major frequent problems in text retrieval comes from large number of words encountered which are not listed in general language dictionaries. However, it is very often the case that these words are morphologically complex, and as such have a meaning which is predictable on the basis of their structure. Furthermore, such words typically belong to specialized language uses (e.g. scientific, philosophical or media technolects). Consequently, tools for listing and analysing such words can help enrich a terminological database. The purpose of this paper is to present a system that automatically generates morphologically complex lexical French items which are not listed in dictionaries, and that furthermore provides a structural and semantic analysis of these items. The output of this system is a morphological database (currently in progress) which forms a powerful lexical resource. It will be very useful in Natural Language Processing (NLP) and in IR (Information Retrieval) applications. Indeed the system generates a potentially infinite set of complex (derived) lexical units (henceforth CLUs) automatically associated with a rich array of morpho-semantic features, and is thus capable of dealing morphologically complex structures which are unlisted in dictionaries.
منابع مشابه
Definition patterns for predicative terms in specialized lexical resources
The research presented in this paper is part of a larger project on the semi-automatic generation of definitions of semantically-related terms in specialized resources. The work reported here involves the formulation of instructions to generate the definitions of sets of morphologically-related predicative terms, based on the definition of one of the members of the set. In many cases, it is ass...
متن کاملCrowd-sourcing evaluation of automatically acquired, morphologically related word groupings
The automatic discovery and clustering of morphologically related words is an important problem with several practical applications. This paper describes the evaluation of word clusters carried out through crowd-sourcing techniques for the Maltese language. The hybrid (Semitic-Romance) nature of Maltese morphology, together with the fact that no large-scale lexical resources are available for M...
متن کاملAutomatic Text Summarization Using Lexical Clustering
The goal of automatic text summarization is to reduce the size of a document while preserving its content. We investigate a summarization method which uses not only statistical features but also the contextual meaning of documents by using lexical clustering. We present a new method to compute lexical cluster in a text without high cost knowledge resources; the WordNet thesaurus. Summarization ...
متن کاملBuilding a Rich Large-scale Lexical Base for Generation
Most large lexical resources have been developed with language interpretation in mind and can not be used directly for generation. We present a rich large-scale lexical base for generation, constructed by merging various linguistic resources. Our approach meets the needs of language generation systems by providing the facilities for mapping from semantic concepts to verb/sense pairs, for identi...
متن کاملImprovement of generative adversarial networks for automatic text-to-image generation
This research is related to the use of deep learning tools and image processing technology in the automatic generation of images from text. Previous researches have used one sentence to produce images. In this research, a memory-based hierarchical model is presented that uses three different descriptions that are presented in the form of sentences to produce and improve the image. The proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000